#depth prediction24/07/2025
Can GPT-4o Truly See? Benchmarking Multimodal Models on Visual Understanding
A recent EPFL study benchmarks multimodal foundation models like GPT-4o on core vision tasks, revealing strengths in semantic understanding but highlighting gaps compared to specialized vision models.